if (!requireNamespace("targets", quietly = TRUE)) {
install.packages("targets")
}
library(targets)
2 Targets Pipeline Overview
The targets
pipeline is used to automate and organize the data analysis workflow in this project. Below is an overview of the pipeline structure and its components.
2.1 Loading the Targets Package
2.2 example targets workflow
# Define targets
list(
# Target 1: Paths to annotation files
tar_target(
funct_annotations_path,::here("data-raw/meat_ref_db/f3m_meat_genes_catalog_20241211_funct_annotations.tsv"),
hereformat = "file"
),tar_target(
gtdb_classification_path,::here("data-raw/meat_ref_db/meat_genes_catalog_gtdb_classification.tsv"),
hereformat = "file"
),
# Target 2: Build reference database
tar_target(
meat_ref_db,build_ref_db(
funct_annotations_path = funct_annotations_path,
gtdb_classification_path = gtdb_classification_path
)
),
# Target 3: Path to folder with sample files
tar_target(
folder_path,::here("data-raw/capfood"),
hereformat = "file"
),
# Target 4: Import multiple sample counts
tar_target(
all_sample_counts,import_multiple_samples(folder_path = folder_path)
),
# Target 5: Define aggregation levels
tar_target(taxonomic_level, "genus"),
tar_target(functional_level, "food_microbiome_metabolic_function"),
# Target 6: Aggregate counts
tar_target(
aggregated_counts,aggregate_counts(
all_sample_counts = all_sample_counts,
ref_db = meat_ref_db,
taxonomic_level = taxonomic_level,
functional_level = functional_level,
basal_categories = c("F3MA_RNA_metabolism",
"F3MB_nucleotide_metabolism", "F3MD_DNA_metabolism")
)
),
# Target 7: Build count matrix
tar_target(
count_matrix,build_count_matrix(
aggregated_counts = aggregated_counts,
deseq2 = TRUE
)
),
# Target 8: Build basal metabolism matrix
tar_target(
basal_metabolism_matrix,build_basal_matrix(
aggregated_counts = aggregated_counts
)
)
)
2.3 Pipeline Visualization
The following plot provides a visual representation of the pipeline dependencies. It shows how the targets are connected and the flow of data processing.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
here() starts at /work_projet/synthplex/f3mr
ℹ Loading f3mr
2.4 Pipeline Summary
The pipeline consists of the following targets:
tar_manifest()
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
here() starts at /work_projet/synthplex/f3mr
ℹ Loading f3mr
2.4.1 Key Targets
- Reference Database (
ref_db
): Builds the reference database for functional and taxonomic annotations. - All Sample Counts (
all_sample_counts
): Imports and combines all sample data. - Aggregated Counts (
aggregated_counts
): Aggregates counts at specified taxonomic and functional levels. - Count Matrix (
count_matrix
): Constructs a matrix for downstream analysis.
2.5 Run the Pipeline
To execute the entire pipeline, run the following command in R:
tar_make()
2.6 Further Exploration
Explore the output of specific targets:
# Load a specific target
<- tar_read(count_matrix)
target_output 1:5,1:5] target_output[